Optimal Compression of Floating-point Astronomical Images 
Without Significant Loss of Information 
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ABSTRACT 

We describe a compression method for floating-point astronomical images that gives compres- 
sion ratios of 6 - 10 while still preserving the scientifically important information in the image. 
The pixel values are first preprocessed by quantizing them into scaled integer intensity levels, 
which removes some of the uncompressible noise in the image. The integers are then losslessly 
compressed using the fast and efficient Rice algorithm and stored in a portable FITS format 
file. Quantizing an image more coarsely gives greater image compression, but it also increases 
the noise and degrades the precision of the photometric and astrometric measurements in the 
quantized image. Dithering the pixel values during the quantization process can greatly improve 
the precision of measurements in the images. This is especially important if the analysis algo- 
rithm relies on the mode or the median which would be similarly quantized if the pixel values 
are not dithered. We perform a series of experiments on both synthetic and real astronomical 
CCD images to quantitatively demonstrate that the magnitudes and positions of stars in the 
quantized images can be measured with the predicted amount of precision. In order to encourage 
wider use of these image compression methods, we have made available a pair of general-purpose 
image compression programs, called fpack and funpack, which can be used to compress any FITS 
format image. 



Subject headings: image compression, FITS format 
1. Introduction 

The exponential growth in the volume of as- 
tronomical data being generated from ever larger 
format imaging detectors continues to drive up 
the data handling costs of both large and small 
telescope projects. The cost of archiving the im- 
ages is only one part of the problem. A typical 
observational work flow involves a long cascade 
of temporary and permanent copies of the data 



replicated from observatory to data processing fa- 
cility to deep storage site to science archive cen- 
ter to virtual observatory portal to multiple end 
users. Each science image provided as input to 
a pipeline will produce several output images as 
a result of processing operations such as resam- 
pling onto a standard grid, co-adding, mosaicking, 
and any analysis steps specific to the science pro- 
gram. The problem is one of throughput - not just 



storage - of the total system data flow. Network 
transmission costs also rival or exceed the cost of 
storage media and can be breathtakingly large for 
spacecraft or remote mountaintops. Often there is 
an upper limit to the network bandwidth at any 
price. Making effective use of the available image 
compression technologies is an important compo- 
nent in dealing with these data handling costs. 

Lossy data compression techniques, which do 
not exactly preserve each image pixel value but 
do still preserve the required scientific informa- 
tion content of the image, are already used by 
many projects. To cite just a fe w examples, the 
Kepl er space telescope project ' Caldwell et al. I 
20101 ) uses a lossy image quantization method 
to reduce the data volume to fit within the lim- 
ited bandwidth from its heliocentric orbit, the 



GONG helioseismology pr oject (jHarvev et al. 



19961 : [Goodrich et al. 1 12004| ) relies on lossy com- 
pression to transmit images from its telescopes de- 
ployed at six locations worldwi de, and the NOAO 
High -Performance Pipeline ( Valdes fc Swaterd 
20071 ) includes a final quantization step from an 
internal floating-point representation to the final 
integer archive data products to achieve greater 
image compression. 

In a previous contribution (jPence. Seaman, fc White! 

20091 hereafter, Paper I), we demonstrated that 
the maximum possible lossless compression ratio 
for integer format astronomical images is deter- 
mined by the amount of noise in the background 
pixels (i.e., the "sky") in the image. The noise 
in astronomical images often has 2 main compo- 
nents: Poissonian-distributed photon noise and 
Gaussian distributed "read-out" noise. If each 
pixel is represented by BITPIX bits (usually 16 or 
32 bits) and if, on average, Nbus of those bits are 
filled with uncompressible, randomly fluctuating 
noise, then the maximum theoretical compression 
ratio for that image is given by BITPIX / Nbus- 
In practice, no compression algorithm is 100% ef- 
flcient, so the actual maximum compression ratio, 
R, is given by 



R = BITPIX/ (iV( 



bits 



K) 



(1) 



where K is an empirical measure of the efficiency 
(or overhead) of the algorithm in units of bits 
per pixel. We compared several lossless compres- 
sion algorithms and found that the Rice algorithm 
(JRice. Yeh. fc Milleill 199.4 1 White fc Beckerlll99j^ . 



which has a small K value of about 1.2 bits per 
pixel, provided the best combination of speed and 
compression efficiency. 

The relationship between noise and entropy in 
images is discussed more fully in the app endix of 
Paper I, based on the seminal work by [Shannon 
(|l948[ ). where we showed that the "equivalent" 
number of noise bits per pixel in an image can be 
calculated from the RMS noise {a) of the pixels in 
background regions of the image such that 



Nuts = log2(aVl2) = log2(a) + 1.792 (2) 
which, when combined with equation 1, gives 

i?=BITPIX/(log2(CT) + 1.792 + i^) (3) 

In this current article we extend the previous 
analysis of integer images to study compression 
techniques for astronomical images in floating- 
point format. In the cases we are mainly con- 
cerned with here, these images originally had in- 
teger pixel values, but were converted into 32-bit 
IEEE floating-point format during the calibration 
processing (e.g., bias subtraction, flat-fielding, ab- 
solute flux calibration, etc.). Equation |3] applies 
to floating point images as well as integer im- 
ages, however typical floating-point astronomical 
images contain so much noise that it is impossible 
to achieve signiflcant amounts of compression with 
lossless algorithms (often less than a factor of 2). 
One explanation for this excess noise is that a 32- 
bit IEEE floating-point number can express 7 deci- 
mal places of precision but this usually far exceeds 
the inherent precision of individual image pixel 
values. As a result many of the least significant 
bits in the mantissa of the fioating-point pixel val- 
ues are effectively filled with quasi-random noise 
which is inherently uncompressible. While there 
are counter-examples of fioating-point FITS arrays 
that have very little noise and can be losslessly 
compressed effectively (e.g., generated by theoret- 
ical simulations), these are not typical of the types 
of floating-point images commonly found in large 
astronomical data archives and are not the subject 
of this article. 

There are many published articles on loss- 
less floating-point data com pression schemes (see 
Lindstrom fc Isenburgll2006l and reference therein 
as recent examples) but it is beyond the scope here 
to summarize them in detail. The main point is 



that ultimately all of these lossless compression 
techniques face the same Shannon entropy limit, 
and the only way to achieve greater compression 
of noisy floating-point images is to use techniques 
that discard some of the noise. These methods 
are technically "lossy" because they do not ex- 
actly preserve the pixel values, however, if only 
noise is discarded, then the compression can still 
be considered lossless from a scientific standpoint 
because all the useful information is retained. 

In the remainder of this article we describe a 
lossy compression technique for floating-point as- 
tronomical images that provides an optimal com- 
bination of speed, compression ratio, and preser- 
vation of information content. It is faster and 
achieves much higher compression than generic 
lossless file c ompression algorithms like GZIP 
(JGaillv fc Adler 1992 ). yet it produces no sig- 
nificant loss of information in the image when 
used appropriately. In ^ we describe in detail 
the quantization and compression techniques that 
have been implemented in our publicly available 
fpack and /unpack image compression utility pro- 
grams. Then in fj3] we describe the results of 
several experiments that demonstrate that astro- 
nomical floating-point images can be compressed 
by up to a factor of 10 without significant loss 
of astrometric or photometric precision. Finally, 
51] summarizes the results and gives recommenda- 
tions for achieving the best compression of astro- 
nomical images. 

2. Quantization and Compression Meth- 
ods 

The most common lossy compression technique 
for floating-point images is to preprocess the pixel 
values by quantizing them into a smaller set of dis- 
crete values prior to applying a lossless compres- 
sion algorithm. In the simplest case, the values 
are rounded into a grid of equally spaced floating- 
point levels. This reduces the number of different 
bit patterns in the image pixels (i.e., reduces the 
entropy in the image) and improves the efficiency 
of file compression programs like GZIP which ac- 
cumulate a dictionary of the most common bit pat- 
terns in the file and represent t hem using a sh orter 



code in the compressed file. IWatsonI (|2002i ) ap- 
plied an analogous technique to integer images. 
In order to accommodate compression algorithms. 



like Rice, that only operate on integer arrays, the 
quantized floating-point values are usually repre- 
sented by scaled integers so that the image pixel 
values are approximated by 

FloatValue = ScaleFactorx IntegerValue+ZeroPoint 

(4) 
Note that this integer scaling technique was the 
only way to represen t floating-point image s in the 
FITS data format (JHanisch etlTI l200ll) before 
support for the IEEE floating-point format was 
officially added in 1990. 

As an aside, there are many articles i n the 



Nieto- Santist cban et al. _ 1999t 



literature (e.g., 

Gowen &: Smitlj|2003l:lNicula. B crghm ans. fc Hochedez 



2005 : ISeaman. White, fc Pencell2009l : lBernstein et al. I 



20101 ) that advocate using a square root scal- 
ing function, in part because the Poissonian shot 
noise scales by this same factor. What is usually 
not stated in these studies is that practically the 
same increase in compression that is obtained af- 
ter applying a square root scaling function can 
be obtained by linearly scaling all the pixels in 
the image by the same factor as is applied to the 
background pixels during the square root scaling. 
Since the compression ratio is determined mainly 
by the noise in the background areas of the image 
(from equation |3]), the amount of scaling that is 
applied to the relatively infrequent bright pixels 
usually makes little difference to the overall com- 
pression ratio of the image. In experiments on 
simulated astronomical images, iBernstein et al. I 
([20101) found that using square-root scaling only 
produced significantly better compression than 
linear scaling when more than 10% of the image 
pixels are affected by isolated bright objects or 
cosmic rays. 

2.1. FITS Tiled Image Compression Con- 
vention 



White k Greenfield! (|l999l ) developed the tech- 



nique that forms the basis of the FITS tiled-image 
compression format that is used here along with a 
few new refinements. Each row of the image (or 
in principle, any other rectangular "tile" in the 
image) is compressed separately to provide fast 
random access to individual sections of an image 
without having to uncompress the entire image. 
In the case of fioating-point images, the pixel val- 
ues are converted to integers with a scale factor 



that, by default, is proportional to the measured 
amount of noise in that tile. 

The noise in each tile of the image is calculated 
using a robust algorithm that was originally devel- 
oped to autonomously m easure the signal-to-noise 
in spectroscopic data (fStoehr et al. Ii2007 ). In par- 
ticular, we use their third order "Median Absolute 
Difference" (MAD) formula to compute the stan- 
dard deviation of the pixel values: 



a = 0.6052 X median(— Xi 



^•Jj •} <*j • 



i+2) 



(5) 



where i is the vector index of the pixel within each 
row of the image, and Xi is the value of the i*'' 
pixel. The median value is computed over all the 
pixels in each row of the image. In the limiting 
case where the pixel values have a Gaussian dis- 
tribution, this formula converges to same value as 
the Standard Deviation of the pixels. Note that 
this formula is not affected by linear intensity gra- 
dients across the image (which are canceled out by 
the first and third terms), and the use of the me- 
dian makes the result insensitive to the presence 
of outlying large pixel values. For example, if one 
randomly sets 5% of the pixels in an image with 
otherwise Gaussian-distributed noise to very large 
values to simulate the effects of cosmic rays, then 
the MAD noise estimate only increases by about 
20%. 

Once the noise level has been calculated, the 
floating-point pixels are quantized into scaled in- 
tegers where the quantized levels are spaced at 
some user-specified fraction, q, of the noise, so 
that the spacing is given by cr/q. Normalizing 
the quantization spacing to cr is a convenient way 
to produce similar quality compressed images re- 
gardless of the intrinsic noise level in the image. 
The scaled integers are then compressed using the 
default Rice algorithm, or one of the other op- 
tional compression algorithms. Finally, the com- 
pressed stream of bytes is stored in a FITS bi- 
nary table structure as define d in the FITS tiled 
image compression c onvention (IPence et al. II200Q: 



Seaman et al. 112007 ) 



Since the noise a in the array of quantized in- 
tegers is simply equal to q, the expected image 
compression ratio, from equation |3l is given by 

i?=BITPIX/(log2(q) + 1.792 + X) (6) 

For example, quantizing a 32-bit floating-point im- 
age using q values of 1 or 4 will produce compres- 



sion ratios of about 10 or 6.4, respectively, when 
using the Rice algorithm that has iiT ~ 1.2. If the 
same image is stored in 64-bit double precision for- 
mat it will compress by twice this amount (i.e., the 
compressed files will have the same size). In most 
cases, the compression ratio does not depend very 
much on the distribution of objects or structures 
within the image itself. Note that this formula 
tends to break down for q values much less than 1 
because many compression algorithms become less 
effective on very low entropy images and because 
the size of the FITS file header, which remains un- 
compressed, becomes relatively more significant. 

One important caveat with the use of this quan- 
tizing method is that if the noise level in the image 
is significantly overestimated for some reason (for 
our purposes it is generally sufficient if it is accu- 
rate to within about a factor of 2), then the im- 
age may be inadvertently quantized more coarsely 
than expected for a given q value, thus possibly 
causing a loss of information in the compressed 
image. Circumstances where the MAD algorithm 
could overestimate the noise include the following: 

• If a large fraction of the pixels in an im- 
age tile are covered by bright objects and 
have values (and noise) which is many times 
greater than in the fainter pixels, then the 
MAD noise estimate will be representative 
of those bright pixels and not that of the 
fainter "background" pixels. This may cause 
the fainter pixels to be more coarsely quan- 
tized than would be desired. 

• If a significant amount of the pixel-to-pixel 
variation in the image is real and not due to 
noise, (e.g., if there are large pixel-to-pixel 
variations in the detector sensitivity, or if the 
observed object contains significant struc- 
ture on a pixel size scale), then the noise 
level may be overestimated. 

• If the noise in an image is anti-correlated 
between adjacent pixels (e.g., after certain 
types of image convolutions) then the noise 
will be overestimated. Note, however, that 
equation [5] is a function only of the values in 
every other pixel in each row of the image, so 
the anti-correlation scale length must extend 
over at least 2 pixels to affect the MAD noise 
estimate. 



Our fpack compression program (see ^2A\ offers 
several options for dealing with this issue. One 
simple method is to just compress the image us- 
ing a larger q value to compensate for the pos- 
sible MAD noise overestimate, although this can 
negatively affect the overall compression ratio of 
the image. Another option in cases where only 
a small fraction of the rows in an image might 
be affected is to compress the entire image as a 
single tile, rather than using the default row-by- 
row tiling pattern, so that the MAD noise estimate 
then more accurately reflects the noise in the back- 
ground regions of the image as a whole. Finally, 
fpack users do not need to rely on the MAD noise 
estimate at all, and instead can directly specify 
the desired spacing between the quantization lev- 
els. This latter option is especially appropriate 
for projects that generate large amounts of rela- 
tively homogeneous images because it ensures that 
all the images will be compressed using the same 
quantization factor. This method has the added 
advantage that it will improve the compression 
speed because it is not necessary to compute the 
MAD noise value in this case. 

2.2. Benefits of Dithering 

The effects of linear quantization are naturally 
greater in the fainter areas of an image than for 
the brighter pixels where the Poissonian uncer- 
tainty of the photon counts can be much larger 
than the quantized spacing. Measurement of the 
local "sky" background around the image of a star 
or galaxy can be especially vulnerable to the ef- 
fects of quantization, in part because it is usu- 
ally necessary to identify and correct for pixels 
that are affected by other objects or by defects in 
the detector. The issue of detecting small ampli- 
tude signals in a quantized system is a well-studied 
problem in engineering and communications fields 
where the phenomenon is known as "stochastic 
resonance" . The somewhat counter-intuitive so- 
lution for improving the signal-to-noise of mea- 
surements of quantized data is to add a moder- 
ate amount of noise into the system. When ap- 
plied to image s, this technique is comm only called 
"dithering". IWidrow fc Kollaij (|2008l ) devote 2 
chapters of their book to the theory and prac- 
tice of dithering and recommend using a clever 
"s ubtractiy e dithe ring" technique, first proposed 
by iRobertsI ( 19621 ) . This technique overcomes the 



drawback of having to add noise to the image: a 
dither is added to the quantizer input, and the 
same dither is subtracted again from the quan- 
tizer output. The dither thus behaves as a cata- 
lyst which makes the process work better but does 
not appear in the output image. We have adopted 
this technique when quantizing fioating-point im- 
ages by adding a random dither, i?^, with a value 
uniformly distributed between and 1 during the 
scaling process, 

Ii = round(Fj/ScaleFactor + Ri - 0.5) (7) 

where Fi is the original floating-point value and li 
is the the quantized integer value. The interest- 
ing trick that distinguishes subtractive dithering 
from ordinary dithering methods is that exactly 
the same random dither value is subtracted when 
converting back to the quantized float value; 



Fi — {li + 0.5 — Ri) X ScaleFactor 



(8) 



The net effect of this subtractive dithering oper- 
ation is to shift the entire grid of linearly spaced 
intensity levels up or down by a random amount 
on a pixel by pixel basis. It should be noted that 
the dithered values are uniformly distributed be- 
tween the quantized levels in this implementation. 
One possible future enhanceme nt may be to use a 
trian gular or Gaussian dither (jWidrow fc Kollar 
20081 ) which may more closely replicate the actual 
distribution of pixel values in the image. 

In order to use this subtractive dithering 
method it is necessary to define a specific pseudo 
random number generator (PRNG) algorithm for 
use by both the compressor and the uncompres- 
sor so that the same predictable sequence of ran- 
dom numbers is used in both case s. We adopted 
the P RNG algorithm described by IPark fc Miller 
( 19881 ) which has been shown to produce statis- 
tically independent random numbers uniformly 
distributed between and 1. However, for prag- 
matic reasons we do not compute a unique random 
number for every single image pixel because (a) 
it would add significant computational overhead 
relative to the very efficient Rice compression al- 
gorithm, and (b) true randomness is not required 
for this purpose since even crude dithering pat- 
terns would still help to mitigate the measurement 
biases in quantized images. Our compromise so- 
lution is to calculate a look-up-table of 10000 ran- 
dom numbers using the above mentioned PRNG, 



and then repeatedly recycle through this LUT 
when calculating the amount of dither for each 
pixel in the image. As a precaution against in- 
troducing regular cyclical noise patterns into the 
image, a pseudo random starting offset is com- 
puted each time when recycling through the LUT. 
Finally, 1 out of a possible 10000 initial seed val- 
ues for the entire dithering process is computed 
based on the system clock time (or optionally the 
checksum of the first row of pixel values in the 
image) to ensure that the same dithering pattern 
is not repeated in every image. This initial seed 
value is stored in the header of the compressed 
image for reuse when uncompressing the image. 

2.3. Quantization Noise 

While quantization reduces the entropy in the 
representation of the pixel values (thus improv- 
ing the image compression ratio) , any quantization 
operation, even with subtractive dithering, modi- 
fies the pixel values and thus inherently increases 
the RMS noise in the pixel-to-pixel variations in 
the image by an amount given by: 



AV12 



(9) 



where the total variance cr^ in the quantized image 
is the sum of the variance CTq in the original image 
plus the quantization noise variance, and where 
A is the intensity spacing between the quantized 
levels. The factor 1/12 comes from the variance 
of a uniform random distribution with unit width 



(|Janesickll200lL also see the appendix to Paper I) . 
Substituting A — ao/q, the fractional increase in 
the noise caused by quantizing the image can be 
expressed as 



ag/ao = Vl + 1/(1V 



(10) 



Figure 1 shows how the fractional noise ratio (from 
equation [TUl) and the compression ratio (from 
equation ^ of an image containing Gaussian- 
distributed noise both depend on q. To fore- 
shadow the results of the experiments that will be 
described in ^ this figure shows that q values in 
the range of 4 to 1 provide a good combination of 
high compression ratio with relatively little added 
noise. 

It should be cautioned that if a floating- 
point image is repeatedly compressed and uncom- 
pressed, then the cumulative quantization noise 
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Fig. 1. — Relationship between the compression 
ratio and the fractional increase in the background 
noise in an image containing Gaussian-distributed 
noise for a range of q quantization parameter val- 
ues. 



will be given by 



ag/ao = v/TTa7(1V) 



(11) 



where N is the number of compression and uncom- 
pression cycles. (This assumes that the dithering 
is re-randomized during each cycle, which is al- 
ways the case in our implementation of the sub- 
tractive dithering algorithm). The amplitude of 
this effect depends strongly on the q value. To 
put this in practical terms, an image can be com- 
pressed and uncompressed at least 16 times using 
q > 4 before the noise would increase to the same 
level equivalent to compressing the image once 
with q = 1. But if the image is compressed multi- 
ple times using q = 1, the noise would increase to 
scientifically unacceptable levels after just a few 
cycles. Ideally, an image should only be com- 
pressed once, and then all the subsequent data 
analysis should be performed directly on the tile- 
compressed FITS file. If the software cannot read 
the compressed format directly, then it should op- 
erate on an uncompressed version of the original 
compressed file, which then should not be recom- 
pressed. 



2.4. fpack and funpack Compression Util- 
ity Programs 

In order to make the image quantization and 
compression techniques that are described in the 
previous sections more widely available to the 
astronomical community, we have developed a 
pair of general pur pose utility program s, called 



fpack and funpack (jSeaman et al. I I20Q7I ). which 



can be used to compress and uncompress any 
FITS image in integer or floating-point format. 
These u tilities rely o n the underlying CFITSIO 
library ( Pence! Il999l ) to perform the quantiza- 
tion and compression operations. The fpack 
and funpack utility programs were used in the 
experiments that are described in the follow- 
ing sections to quantize and compress the im- 
ages. Further information about fpack and fun- 
pack is available from the HEASARC web site at 
[http: //heasarc .gsfc. nasa.gov/fitsio/fpack 

3. Experiments on Quantized Images 

In this section we present the results of experi- 
ments designed to show how the measurements of 
objects in an image are affected as the image is 
quantized by varying degrees. In particular, we 
will verify that the noise in the image increases as 
a function of q by the amount predicted by equa- 
tion IIOI and more significantly, that the statisti- 
cal errors on the magnitude and position measure- 
ments of faint objects in an image, which are lim- 
ited mainly by the background noise, also increase 
by a similar factor. These results will provide gen- 
eral guidelines for achieving the greatest amount 
of image compression while still preserving the re- 
quired level of scientific precision in the image. 

Section 13.11 describes the method of construct- 
ing the simulated CCD images that are used in 
the first 2 experiments. The first experiment, in 
H3.2\ examines how quantization affects the uncer- 
tainties of measurements of single star images, and 
the second experiment, in ^ 13. 31 examines the case 
where many quantized images are added together 
to detect sources far below the detection threshold 
of a single image. Finally, the third experiment, 
in H'SAl is performed on a set of actual astronomi- 
cal images to verify the results obtained from the 
synthetic images. 



3.1. Data Model and Measurement Meth- 
ods 

In order to determine how quantization affects 
the precision of measurements of objects in an im- 
age, we generated a large sample of realistic CCD 
star images with known input positions and mag- 
nitudes. This allows us to precisely calculate the 
errors on the measured positions and magnitudes 
in the quantized images. All the stars have circu- 
lar Gaussian profiles with a — 1.0 and FWHM — 
2.35 pixels. This is typical of the spatial resolution 
commonly found in astronomical CCD images and 
is adequate to avoid the difficulties when analyz- 
ing spatially undersampled images. The central 
location of the star images, relative to the pixel 
grid, was varied so as to average out any subtle 
biases in the subsequent star detection and mea- 
surement steps that might depend on the exact 
position. The total integrated flux in the stars cov- 
ered a range of 10 magnitudes (a factor of 10000 
in intensity) in 0.5 magnitude increments. Finally, 
the sky background was simulated by adding 1000 
counts to each pixel. 

Poissonian-distributed shot noise and Gaussian 
distributed "read-out" noise was then added to 
each of these star images to simulate real CCD 
images. The shot noise in each pixel was ran- 
domly calculated using a a equal to the square 
root of that pixel value (which implicitly assumes 
that the "gain" of the simulated CCD has been 
set to 1 electron per analog-to-digital readout 
count), and the readout noise was calculated us- 
ing a = 10. The read-out noise in these images 
is relatively small compared to the shot noise in 
the sky background, which is usually the case 
for real astronomical CCD images that have a 
moderately bright background level. The total 
noise in the background areas of these images has 
a = VlOOO + 102 = 33.2. Different starting ran- 
dom seed values were used so that the actual noise 
distribution varies in every image. 

The wi dely used SExtractor so urce extraction 
program (jBertin fc ArnoutsI 119961 ) was employed 
to objectively detect and measure the position and 
magnitude of the stars in these simulated, noisy 
CCD images. SExtractor calculated the positions 
(given by the XJMAGE and YJMAGE output 
parameters) from the flux-weighted centroid of all 
the pixels in each star image above an empirically 



determined flux detection threshold. The total 
magnitude of each star (given by the MAG_APER 
parameter) was measured within a 7-pixel diame- 
ter aperture (i.e., out to 3.5 ct of the Gaussian pro- 
file) around the centroid position. We calculated 
the actual errors on these measurements from the 
differences between the known input position and 
the input flux (corrected for the 0.3% of the flux 
that falls outside the 3.5a aperture) of each star 
in the synthetic image. For reference, the faintest 
stars that SExtractor could reliably detect in these 
images contained about 1000 net counts, which 
is coincidentally equal to the mean sky counts in 
each pixel. We arbitrarily set the zero point of the 
instrumental magnitude scale, 

m = -2.5 log(flux) + ZeroPoint (12) 

so that these faintest detected stars have a mag- 
nitude of 20.0. 

We repeated the SExtractor measurements of 
the simulated CCD images after quantizing and 
compressing each image with fpack using q val- 
ues ranging from 8 to 0.25, both with and with- 
out the subtractive dithering option. Since the 
SExtractor program cannot directly read images 
in the compressed FITS format, it was necessary 
to convert them back into the standard FITS im- 
age format using /unpack (but the pixel values 
remain quantized, of course). In the case where 
dithering was not applied, we also selected the 
option to compress the entire image as a single 
large tile so that all the pixels are quantized into 
the same fixed grid of intensity values. If we had 
used the default row-by-row tiling option instead, 
it effectively would have introduced some dither- 
ing of the quantized levels between adjacent rows, 
and the results would be intermediate between the 
dithered and non-dithered cases presented below. 

Figure 2 shows the dramatic effect that subtrac- 
tive dithering has on the histogram of the pixel 
values in a q = 1 quantized image; the dithered 
image histogram is nearly identical to the that of 
the original image, whereas the non-dithered his- 
togram clearly shows the coarse intensity binning. 
Without dithering, the image breaks up into dis- 
crete "bands" of constant intensity which makes it 
more difficult to detect faint features in the image. 
This effect is shown in Figure 3, (using an even 
more coarsely quantized q = 0.5 image to enhance 
the effect), where the majority of the background 
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Fig. 2. — Histograms of the pixel intensity dis- 
tribution in one of our test images which demon- 
strates the beneficial effects of subtractive dither- 
ing when quantizing the image with q = 1. When 
dithering is applied (the finely stepped histogram), 
the pixel distribution closely matches that in the 
original image (the continuous curve). When 
dithering is not applied (the coarsely stepped his- 
togram) then subtle gradations in image intensity 
are lost. 




Fig. 3. — Images of a pair of faint (m — 20) ar- 
tificial stars (left panel) and the same image after 
quantizing the image with q = 0.5 with dither- 
ing (middle panel) and without dithering (right 
panel). Without dithering, the background breaks 
up into broad patches of constant intensity level, 
and the star images become harder to detect. 



pixels all have the same (medium gray) intensity 
value. 



3.2. Experiment 1: Single Images 

In this first experiment, we examine how the 
uncertainties of the photometric and astrometric 
measurements of individual stars depend on the 
q quantization factor. To measure this effect, we 
computed the Standard Deviation of the magni- 
tude and position errors (i.e., the value computed 
by SExtractor minus the known input magnitude 
or position value) for 6250 simulated star images 
at each 0.5 magnitude increment. Figure 4 shows 
how the magnitude uncertainties decrease as the 
brightness of the star increases (towards the right). 
The lower, thicker line in the figure was derived 
from the original, unquantized images and shows 
that the statistical uncertainty on the magnitudes 
decreases from a — 0.23 for the faintest detectable 
stars (with m = 20), to cr = 0.018 for stars that 
are 3 magnitudes brighter. The line has a slope 
close to 1.0 (in log - log coordinates) as expected 
in the limiting case where the noise is dominated 
by the sky, and hence the signal-to-noise ratio of 
the measurement increases in direct proportion to 
the signal. 

The other 3 lines in Figure 4 were derived from 
the same images after they were first quantized 
and compressed with successively coarser q values 
of 1.0, 0.5, and 0.25, when also applying the sub- 
tractive dithering option. The vertical displace- 
ment of these lines shows that the measurement 
errors are systematically larger in the quantized 
images as compared to the measurements in the 
original image. This increase is relatively small 
for q > 1, but increases rapidly for coarser quan- 
tization values. 

Figure 5 shows a similar plot of the Standard 
Deviation of the stellar centroid measurements, in 
units of pixels, as a function of the magnitude of 
the star and the q quantization factor. In the origi- 
nal unquantized images, the positional uncertainty 
decreases from a — 0.24 pixels for the m = 20 
stars, to CT = 0.035 pixels for the stars that are 3 
magnitudes brighter. The positional measurement 
uncertainties are larger in the quantized image by 
about the same factor as the increase in the mag- 
nitude uncertainties shown in Figure 4. 

In order to better quantify the effects shown 
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Fig. 4. — The effect of q on the measured magni- 
tude uncertainties when dithering is applied. 
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Fig. 5. — The effect of q on the measured posi- 
tional uncertainties when dithering is applied. 



in these 2 figures, Table 1 summarizes how the 
noise and the measurement uncertainties increase 
as the images are quantized more coarsely. The 
measured percentage increase in the MAD noise 
level in the quantized images is given in column 2 
and agrees exactly with the predicted value from 
equation [TOl More strikingly, the magnitude and 
position uncertainties for the faintest stars also in- 
crease by about the same factor as the noise, as 
shown in columns 3 and 4. This means that equa- 
tion [10] also provides a good estimate of how much 
the measurement uncertainties of the faintest ob- 
jects in an image, which are limited by the back- 
ground noise, will increase when the image is 
quantized. 

Finally, columns 5 and 6 in Table 1 give the 
corresponding increase in measurement uncertain- 
ties for stars that are 5 magnitudes, or a factor 
of 100, brighter than the image detection thresh- 
old. As expected, these brighter objects are rel- 
atively less affected by quantization because the 
inherent Poissonian noise in the brighter pixels is 
larger than the spacing between the quantized lev- 
els. Also, the small formal statistical errors on 
the magnitudes and positions of the brighter stars 
(which are less than 0.001 of a magnitude or pixel, 
respectively, in this case) are often insignificant 
compared to the systematic errors in the absolute 
calibration of the measurements. 

For comparison. Figures 6 and 7 show that if 
the quantized pixels are not dithered then the 
measurement errors are larger than in the dithered 
case shown previously in Figures 4 and 5. It is 
striking that while dithering provides a modest 
increase in the accuracy of the centroid measure- 
ments, it greatly improves the magnitude mea- 
surements. Further investigation has shown that 
the larger magnitude errors are almost entirely 
caused by a dramatic increase in the errors in the 
SExtractor background estimate around each star 
if the quantized image is not dithered. SExtractor 
assumes that the best estimate of the true back- 
ground level is given by the peak in the histogram 
of the pixel values (i.e., the mode) and it uses the 
following empirically derived formula 
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Fig. 6. — Same as Figure 4 but without dithering. 
The magnitude uncertainties are much larger than 
in the dithered case. 
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mode — 2.5 x median — 1.5 x mean 



(13) 



Fig. 7. — Same as Figure 5 but without dithering. 



to correct for the slight skewness in the pixel dis- 
tribution (i.e., an excess of brighter pixels) that is 
commonly seen due to contamination by other ob- 
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Table 1 
Increased noise and measurement uncertainties as function of 
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jects in the image. This formula generally works 
weU if the pixel values have a continuous distribu- 
tion, as demonstrated in our tests on the dithered 
star images, however, it behaves poorly if the pix- 
els are quantized because then the median value is 
also quantized. The amplitude of the error in the 
background estimate depends on how closely one 
of the quantized levels happens to match the true 
background level. In the worst case, where the 
true value lies half way between 2 of the quantized 
levels (and assuming that the mean value lies very 
close to the true background level, which is usually 
the case in our images), then the error is equal to 
1.25 times the intensity spacing between adjacent 
quantized levels. On average, the RMS error in 
the background estimate introduced by this effect 
win be (2 X 1.25)/VT2 = 0.72 times the quantized 
spacing. It is ironic that equation [13] is intended 
to provide a more accurate background estimate 
than simply using the median value alone, but it 
actually increases the errors by a factor of 2.5 if 
the image is significantly quantized. 

It should be emphasized that while this partic- 
ular problem applies specifically to the way SEx- 
tractor estimates the background intensity, it is 
likely that similar problems would arise with other 
astronomical data analysis packages unless they 
are specifically designed to deal with quantized 
data. Another way to view this problem is that 
many algorithms are not well behaved if the noise 
is under-sampled. Dithering helps to eliminate 
this problem because it preserves the natural dis- 
tribution of the pixel values, as shown in Figure 
2. 

As a final test, it is important to establish that 
there are no subtle systematic zero-point biases in 
the measured magnitudes in the quantized images. 
This is demonstrated in Figure 8, which plots the 



mean magnitude residual (true magnitude minus 
measured magnitude) for the 6250 simulated star 
images within each 0.5 magnitude bin after quan- 
tizing the images with a range of different q values. 
The upper panel shows the mean residuals in the 
subtractively dithered images, and the lower panel 
shows the residuals in the non-dithered quantized 
images. The thicker line in both panels was de- 
rived from the original unquantized images and 
shows that there is a small intrisic negative bias 
(-0.02 mag) in the SExtractor magnitudes of the 
faintest stars. The cause of this small measure- 
ment bias, which is less than 1/lOth the statisti- 
cal error on a single magnitude measurement, is 
unknown, but it might be due to the algorithms 
that SExtractor uses to measure the fainter stars, 
or perhaps due to the particular set of SExtrac- 
tor configuration parameters that we used in these 
tests. The important point here is that a very 
similar pattern of residuals is seen (in the upper 
panel) in the coarsely quantized and dithered im- 
ages, which shows that there are no significant ad- 
ditional magnitude biases in these quantized im- 
ages. For q > 2, the mean residuals match those in 
the unquantized images so closely that the points 
lie within the thickness of the line in the figure. 
If dithering is not applied, however, then signif- 
icant biases in the magnitude residuals may be 
present in the quantized images, as shown in the 
lower panel of Figure 8. This effect is entirely due 
to the systematic errors caused by the use of the 
median function in estimating the background in- 
tensity level, as previously discussed. 

3.3. Experiment 2: Co-added Images 

The second experiment investigates how well 
faint simulated Gaussian stars (as described in 
Wi.l\ that are below the detection threshold of 
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Fig. 8. — The mean magnitude residual (true 
magnitude minus measured magnitude) derived 
from the simulated star images after quantizing 
the images with different q values. The upper 
panel shows the case where the pixel values were 
subtractively dithered and the lower panel shows 
the case with no dithering. The bold line in both 
cases shows the small inherent bias in the SExtrac- 
tor magnitudes in the original images, without any 
quantization. The lines for q = 2 and q = 4 are 
hidden under this bold line in the upper panel. 



a single image can be measured after co- adding 
250 similar images together. To first order, one 
would expect that this experiment should detect 
stars that are about 2.51og\/250 — 3.0 magni- 
tudes fainter than in the single images. For com- 
parison, we also generated quantized co-added im- 
ages by quantizing each of the 250 individual im- 
ages (using a range of q values, with and with- 
out subtractive dithering) before co-adding them. 
The magnitude and position of the stars in each 
of these co-added images were then measured with 
the SExtractor program, just as in the first exper- 
iment. 

Figure 9 shows the results of the photometric 
measurements in the co-added images. Other than 
the fact that the lines in the plots exhibit more 
statistical scatter, since they were derived from a 
sample of only 250 star measurements in each 0.5 
magnitude bin instead of 6250, these results are 
very similar to the results from the single images 
shown in Figure 4, after allowing for the expected 
shift of the horizontal axis by 3 magnitudes. This 
similarity confirms that the photometric measure- 
ments in the co-added images have the same de- 
pendence on q as in the individual images. In par- 
ticular, it confirms that information about faint 
stars that are well below the detection threshold 
in a single image is still preserved in the quantized 
images and can be detected after co-adding many 
quantized images together. 

We also confirmed that the astrometric preci- 
sion in the co-added images shows the same de- 
pendence on q as the measurements in the single 
images. A plot of the positional uncertainty as a 
function of magnitude (not shown here) is almost 
identical to Figure 5, except for the 3 magnitude 
shift of the horizontal axis. 

Finally, Figure 10 shows the magnitude uncer- 
tainties when the quantized images are not sub- 
tractively dithered before co-adding them. Unlike 
the case for single images (as compared in Figures 
4 and 6), dithering has little effect on the precision 
of the photometry in the co-added image. This 
is because the co-adding process effectively intro- 
duces its own form of pixel dithering if the quan- 
tized levels in the different images are randomly 
offset with respect to each other. Thus, there is lit- 
tle added benefit by also dithering the pixel values 
within each image. IPrice-Whelan fc Hoga ( 20101 ) 
demonstrate an even more extreme case where ac- 
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curate photometry and astrometry could be per- 
formed on a co-added sum of 1024 images, each 
quantized with q = 1/16 = 0.0625 without any 
dithering. 
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Fig. 9. — The effect of q on the measured magni- 
tude uncertainties in the sum of 250 images, with 
dithering. 
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Fig. 10. — Same as Figure 9, but without dither- 
ing. Dithering the 250 individual images has little 
effect on the precision of the magnitude measure- 
ments in the summed image. 
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Fig. 11. — Sample region in one of the test im- 
ages taken with the Isaac Newton Telescope at La 
Palma. This shows about 10% of the total image 
area. 



3.4. 



Experiment 
Images 



Real Astronomical 



In the third and final experiment we performed 
tests on a set of real astronomical CCD images 
to confirm the previous results derived from syn- 
thetic images. We used a sequence of 20 similar 
CCD images taken with the 2.5-m Isaac Newton 
Telescope at La Palma that show a random dis- 
tribution of faint stars and galaxies (Figure 11). 
All the exposures were for 600 seconds through 
a V-band filter with the same pointing on the 
sky (12''51™,26°24'). These images are publicly 
available through the virtual observatory portal 
at Ihttp : //portal-nvo . noao . edu 

We performed 2 tests to measure the effects of 
quantization on these images. In the first test 
we coarsely quantized one of the images using q 
= 1 (with subtractive dithering), by compress- 
ing it with fpack and then uncompressing it again 
with funpack. As predicted by equation [SI fpack 
achieved a compression ratio of about 10, which 
is equivalent to 3.2 bits per pixel in the com- 
pressed image. Also as expected from equation 
1101 the measured background noise level increased 
by 4.1%, from a = 22.79 in the original image to 
a — 23.73 in the quantized image. 

We then compared the SExtractor magnitude 



measurements in the original image to those de- 
rived from the quantized image. Since we do not 
know the true magnitudes of the stars in this im- 
age (unlike in the experiments on the synthetic 
stars), we can only compare the measurements 
of the stars in the 2 images, both of which have 
measurement uncertainties. The top panel of Fig- 
ure 12 shows the difference between the 2 magni- 
tude measurements for each star as a function of 
the magnitude of the star, and the middle panel 
shows the corresponding relative errors (the mag- 
nitude difference divided by the statistical error on 
that magnitude as calculated by SExtractor) . The 
larger points in these panels show the mean differ- 
ence averaged over 0.5 mag bins, which demon- 
strate that there is no significant systematic bias 
between the 2 sets of magnitudes measurements. 
The RMS value of all the relative errors in the 
middle panel is 0.34a. This is slightly larger than 
the factor of a/1/12 = 0.29 that one would ex- 
pect simply from the added quantization noise and 
may be due to other residual sources of noise in 
the SExtractor calculations. The fact that this is 
much less than the approximately la dispersion 
that one would expect when comparing the mag- 
nitudes derived from 2 identical CCD images of 
the same stars, confirms that quantizing the im- 
age with q = 1 has not introduced statistically 
significant differences in the magnitude measure- 
ments. Finally, the lower panel plots the ratio of 
the magnitude errors in the quantized and original 
images which shows that the statistical errors of 
the faintest stars are on average about 4% greater 
in the q = 1 quantized image, as expected. 

For the second test we created 2 new images 
by co-adding the 20 original CCD images and by 
coadding the c^ = 1 quantized version of each of 
the images and then repeated the same tests as 
described above. The results, shown in Figure 13, 
are similar to those in Figure 12 except that the 
faintest detected stars are now 2.5 log a/20 = 1.6 
mag fainter, as a result of co-adding the 20 images. 
The RMS value of the relative errors shown in the 
middle panel is 0.33cr in this case, which again 
demonstrates that quantizing the images with q 
= 1 has not produced any significant photometric 
errors, even in objects that are fainter than the 
detection threshold in a single image. 
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4. Summary and Discussion 
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Fig. 12. — Differences in star magnitudes mea- 
sured in a CCD image and a q = 1 quantized ver- 
sion of the same image. The upper panel shows 
the absolute magnitude differences, and the mid- 
dle panel shows the relative differences after divid- 
ing by the statistical error. The lower panel shows 
the ratio of the statistical errors in the quantized 
image relative to those in the original image. 
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Fig. 13. — Same as Figure 12, except the magni- 
tudes are measured in the sum of 20 CCD images 
and in the sum of the q = 1 quantized version of 
each of the images. 



The main purpose of this work has been to find 
a more effective compression method for floating- 
point astronomical images than is provided by the 
lossless methods that are commonly used (such 
as GZIP). These floating-point images typically 
do not compress well with lossless algorithms be- 
cause a large fraction of the bits in each pixel 
value representation contain no significant infor- 
mation and are effectively filled with uncompress- 
ible noise. We have adopted the method of elim- 
inating some of this noise by quantizing the pixel 
values into a set of discrete, linearly spaced inten- 
sity levels, which are represented by scaled integer 
values. The scaled integers can then be efficiently 
compressed using the very fast Rice algorithm. In 
order to make these compression techniques more 
widely available, we have produced a pair of util- 
ity programs called fpack and /unpack that can be 
used to compress any FITS format image. 

For convenience, we define a quantization pa- 
rameter, q, which is equal to the measured RMS 
noise in background regions of the image divided 
by the spacing between the quantized intensity 
levels. Given a particular q value when quantizing 
and compressing an image, one can calculate from 
fundamental principles the expected image com- 
pression ratio, as given by equation |6l Coarser 
quantization (i.e., smaller q values) gives greater 
image compression, but at the same time it in- 
creases the RMS pixel-to-pixel noise by an amount 
given by equation [TU] which also tends to degrade 
the precision of the magnitude and position mea- 
surements of the objects in the image. 

Our series of experiments on simulated and on 
real astronomical CCD images demonstrate that 
the noise equation [10] also gives a good estimate 
of the increase in measurement uncertainties for 
objects near the detection threshold in a quan- 
tized image (which are noise limited). For many 
practical applications, q values between 1 and 4 
provide a good combination of high compression 
and low increased noise: q = 1 gives a compres- 
sion ratio of about 10 while increasing the noise 
and the measurement uncertainties by about 4%, 
and q = 4 gives a compression ratio of 6 with a 
negligible 0.26% increase in the noise and mea- 
surement uncertainties. In "quick-look" types of 
applications, where high scientific accuracy is not 
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of primary importance, even greater compression 
can be obtained by using q values less than 1. 

One necessary requirement for achieving these 
results, at least when using the SExtractor pro- 
gram to perform the measurements, is that the 
pixel values must be dithered to be able to ac- 
curately measure the background intensity level 
around each object. The algorithm that SExtrac- 
tor uses to estimate the background relies on the 
median of the pixels values, which is not a reliable 
statistic if the values are quantized. Since other as- 
tronomical image analysis software may have sim- 
ilar issues dealing with quantized images, it would 
be prudent to always dither the pixel values during 
the quantization process to avoid possible compli- 
cations later on, especially since dithering intro- 
duces no significant undesirable side effects. 

Several other recent studies have reached 
similar conclusions about the benefits quantiz- 
ing astronomical images to reduce the amount 
of noise and improv e the compression ratio. 
Price- Whelan fc Hoga pOld ) performed photo- 
metric and astrometric measurements on stars in 
simulated CCD images (using a slightly differ- 
ent methodology than used here) and concluded 
that no significant errors were introduced if the 
image is quantized (w it hout dithering) with q 



> 2. iBernstein et al. I ( 20101 ) found that using 



q = 1, along with square- root scaling does not 
induce any significant bias in weak-lensing shape 
measurements of galaxies (i.e., the second sta- 
tistical moment of the image) in simulated im- 
ages from futu r e spac e-based imaging missions. 
Caldwell et al. I ( 20101 ) describe how the Kepler 
mission is performing on-board quantization of 
the images with q values as low as 1.15 to obtain 
the necessary amount of compression of the down- 
linked telemetry while still preserving the required 
high precision in the differential photometry mea- 
surements of stars. While we recognize that ap- 
plying any sort of lossy compression scheme to 
scientific data tends to go against the inclination 
of scientists and data archive professionals, hope- 
fully the results of the experiments performed here 
and in the other cited references will help to alle- 
viate these concerns. 

While we believe our test results should be ap- 
plicable to a wide range of floating-point astro- 
nomical images, we strongly encourage users to 
perform their own tests to verify that this com- 



pression technique is appropriate for their own 
data. Users can easier create a quantized and 
dithered version of any floating-point FITS image 
by compressing it with fpack and then uncompress- 
ing it with funpack. This image can then then be 
processed just like the original image to see if there 
are any signiflcant differences in the results. 

We wish to thank the (second) anonymous ref- 
eree for a thorough review that improved the pa- 
per and helped us identify the reason for large 
SExtractor magnitude errors when dithering is not 
used. 
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